{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第二题：神经网络：线性回归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实验内容：\n",
    "1. 学会梯度下降的基本思想\n",
    "2. 学会使用梯度下降求解线性回归\n",
    "3. 了解归一化处理的作用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 线性回归\n",
    "\n",
    "<img src=\"https://davidham3.github.io/blog/2018/09/11/logistic-regression/Fig0.png\" width=300>\n",
    "\n",
    "我们来完成最简单的线性回归，上图是一个最简单的神经网络，一个输入层，一个输出层，没有激活函数。  \n",
    "我们记输入为$X \\in \\mathbb{R}^{n \\times m}$，输出为$Z \\in \\mathbb{R}^{n}$。输入包含了$n$个样本，$m$个特征，输出是对这$n$个样本的预测值。  \n",
    "输入层到输出层的权重和偏置，我们记为$W \\in \\mathbb{R}^{m}$和$b \\in \\mathbb{R}$。  \n",
    "输出层没有激活函数，所以上面的神经网络的前向传播过程写为：\n",
    "\n",
    "$$\n",
    "Z = XW + b\n",
    "$$\n",
    "\n",
    "我们使用均方误差作为模型的损失函数\n",
    "\n",
    "$$\n",
    "\\mathrm{loss}(y, \\hat{y}) = \\frac{1}{n} \\sum^n_{i=1}(y_i - \\hat{y_i})^2\n",
    "$$\n",
    "\n",
    "我们通过调整参数$W$和$b$来降低均方误差，或者说是以降低均方误差为目标，学习参数$W$和参数$b$。当均方误差下降的时候，我们认为当前的模型的预测值$Z$与真值$y$越来越接近，也就是说模型正在学习如何让自己的预测值变得更准确。\n",
    "\n",
    "在前面的课程中，我们已经学习了这种线性回归模型可以使用最小二乘法求解，最小二乘法在求解数据量较小的问题的时候很有效，但是最小二乘法的时间复杂度很高，一旦数据量变大，效率很低，实际应用中我们会使用梯度下降等基于梯度的优化算法来求解参数$W$和参数$b$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 梯度下降\n",
    "\n",
    "梯度下降是一种常用的优化算法，通俗来说就是计算出参数的梯度（损失函数对参数的偏导数的导数值），然后将参数减去参数的梯度乘以一个很小的数（下面的公式），来改变参数，然后重新计算损失函数，再次计算梯度，再次进行调整，通过一定次数的迭代，参数就会收敛到最优点附近。\n",
    "\n",
    "在我们的这个线性回归问题中，我们的参数是$W$和$b$，使用以下的策略更新参数：\n",
    "\n",
    "$$\n",
    "W := W - \\alpha \\frac{\\partial \\mathrm{loss}}{\\partial W}\n",
    "$$\n",
    "\n",
    "$$\n",
    "b := b - \\alpha \\frac{\\partial \\mathrm{loss}}{\\partial b}\n",
    "$$\n",
    "\n",
    "其中，$\\alpha$ 是学习率，一般设置为0.1，0.01等。\n",
    "\n",
    "接下来我们会求解损失函数对参数的偏导数。\n",
    "\n",
    "损失函数MSE记为：\n",
    "\n",
    "$$\n",
    "\\mathrm{loss}(y, Z) = \\frac{1}{n} \\sum^n_{i = 1} (y_i - Z_i)^2\n",
    "$$\n",
    "\n",
    "其中，$Z \\in \\mathbb{R}^{n}$是我们的预测值，也就是神经网络输出层的输出值。这里我们有$n$个样本，实际上是将$n$个样本的预测值与他们的真值相减，取平方后加和。\n",
    "\n",
    "我们计算损失函数对参数$W$的偏导数，根据链式法则，可以将偏导数拆成两项，分别求解后相乘：\n",
    "\n",
    "**这里我们以矩阵的形式写出推导过程，感兴趣的同学可以尝试使用单个样本进行推到，然后推广到矩阵形式**\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial W} &= \\frac{\\partial \\mathrm{loss}}{\\partial Z} \\frac{\\partial Z}{\\partial W}\\\\\n",
    "&= - \\frac{2}{n} X^\\mathrm{T} (y - Z)\\\\\n",
    "&= \\frac{2}{n} X^\\mathrm{T} (Z - y)\n",
    "\\end{aligned}$$\n",
    "\n",
    "同理，求解损失函数对参数$b$的偏导数:\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial b} &= \\frac{\\partial \\mathrm{loss}}{\\partial Z} \\frac{\\partial Z}{\\partial b}\\\\\n",
    "&= - \\frac{2}{n} \\sum^n_{i=1}(y_i - Z_i)\\\\\n",
    "&= \\frac{2}{n} \\sum^n_{i=1}(Z_i - y_i)\n",
    "\\end{aligned}$$\n",
    "\n",
    "**因为参数$b$对每个样本的损失值都有贡献，所以我们需要将所有样本的偏导数都加和。**\n",
    "\n",
    "其中，$\\frac{\\partial \\mathrm{loss}}{\\partial W} \\in \\mathbb{R}^{m}$，$\\frac{\\partial \\mathrm{loss}}{\\partial b} \\in \\mathbb{R}$，求解得到的梯度的维度与参数一致。\n",
    "\n",
    "完成上式两个梯度的计算后，就可以使用梯度下降法对参数进行更新了。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "训练神经网络的基本思路：\n",
    "\n",
    "1. 首先对参数进行初始化，对参数进行随机初始化（也就是取随机值）\n",
    "2. 将样本输入神经网络，计算神经网络预测值 $Z$\n",
    "3. 计算损失值MSE\n",
    "4. 通过 $Z$ 和 $y$ ，以及 $X$ ，计算参数的梯度\n",
    "5. 使用梯度下降更新参数\n",
    "6. 循环1-5步，**在反复迭代的过程中可以看到损失值不断减小的现象，如果没有下降说明出了问题**\n",
    "\n",
    "接下来我们来实现这个最简单的神经网络。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 导入数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用kaggle房价数据，选3列作为特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "# 读取数据\n",
    "data = pd.read_csv('data/kaggle_house_price_prediction/kaggle_hourse_price_train.csv')\n",
    "\n",
    "# 使用这3列作为特征\n",
    "features = ['LotArea', 'BsmtUnfSF', 'GarageArea']\n",
    "target = 'SalePrice'\n",
    "data = data[features + [target]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 数据预处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "40%做测试集，60%做训练集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "trainX, testX, trainY, testY = train_test_split(data[features], data[target], test_size = 0.4, random_state = 32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "训练集876个样本，3个特征，测试集584个样本，3个特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trainX.shape, trainY.shape, testX.shape, testY.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 参数初始化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里，我们要初始化参数$W$和$b$，其中$W \\in \\mathbb{R}^m$，$b \\in \\mathbb{R}$，初始化的策略是将$W$初始化成一个随机数矩阵，参数$b$为0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def initialize(m):\n",
    "    '''\n",
    "    参数初始化，将W初始化成一个随机向量，b是一个长度为1的向量\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    m: int, 特征数\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    W: np.ndarray, shape = (m, ), 参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, ), 参数b\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    # 指定随机种子，这样生成的随机数就是固定的了，这样就可以与下面的测试样例进行比对\n",
    "    np.random.seed(32)\n",
    "    \n",
    "    W = np.random.normal(size = (m, )) * 0.01\n",
    "    \n",
    "    b = np.zeros((1, ))\n",
    "    \n",
    "    return W, b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 前向传播"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里，我们要完成输入矩阵$X$在神经网络中的计算，也就是完成 $Z = XW + b$ 的计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def forward(X, W, b):\n",
    "    '''\n",
    "    前向传播，计算Z = XW + b\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, shape = (n, m)，输入的数据\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，权重\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，偏置\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    Z: np.ndarray, shape = (n, )，线性组合后的值\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    # 完成Z = XW + b的计算\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return Z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "tmp = forward(trainX, Wt, bt)\n",
    "print(tmp.mean()) # -28.37377"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 损失函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来编写损失函数，我们以均方误差(MSE)作为损失函数，需要大家实现MSE的计算：\n",
    "\n",
    "$$\n",
    "\\mathrm{loss}(y, Z) = \\frac{1}{n} \\sum^n_{i = 1} (y_i - Z_i)^2\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mse(y_true, y_pred):\n",
    "    '''\n",
    "    MSE，均方误差\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    y_true: np.ndarray, shape = (n, )，真值\n",
    "    \n",
    "    y_pred: np.ndarray, shape = (n, )，预测值\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    loss: float，损失值\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    # 计算MSE\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "tmp = mse(trainY, forward(trainX, Wt, bt))\n",
    "print(tmp) # 39381033680.5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 反向传播"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里我们要完成梯度的计算，也就是计算出损失函数对参数的偏导数的导数值：\n",
    "\n",
    "$$\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial W} = \\frac{2}{n} X^\\mathrm{T} (Z - y)\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial b} = \\frac{2}{n} \\sum^n_{i=1}(Z_i - y_i)\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient(X, Z, y_true):\n",
    "    '''\n",
    "    计算梯度\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, shape = (n, m)，输入的数据\n",
    "    \n",
    "    Z: np.ndarray, shape = (n, )，线性组合后的值\n",
    "    \n",
    "    y_true: np.ndarray, shape = (n, )，真值\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    dW, np.ndarray, shape = (m, ), 参数W的梯度\n",
    "    \n",
    "    db, np.ndarray, shape = (1, ), 参数b的梯度\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    n = len(y_true)\n",
    "    \n",
    "    # 计算W的梯度\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    # 计算b的梯度\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return dW, db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "Zt = forward(trainX, Wt, bt)\n",
    "dWt, dbt = compute_gradient(trainX, Zt, trainY)\n",
    "print(dWt.shape) # (3,)\n",
    "print(dWt.mean()) # -1532030241.25\n",
    "print(dbt.mean()) # -364308.555764"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. 梯度下降"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这部分需要实现梯度下降的函数\n",
    "$$\n",
    "W := W - \\alpha \\frac{\\partial \\mathrm{loss}}{\\partial W}\n",
    "$$\n",
    "\n",
    "$$\n",
    "b := b - \\alpha \\frac{\\partial \\mathrm{loss}}{\\partial b}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def update(dW, db, W, b, learning_rate):\n",
    "    '''\n",
    "    梯度下降，参数更新，不需要返回值，W和b实际上是以引用的形式传入到函数内部，\n",
    "    函数内改变W和b会直接影响到它们本身，所以不需要返回值\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    dW, np.ndarray, shape = (m, ), 参数W的梯度\n",
    "    \n",
    "    db, np.ndarray, shape = (1, ), 参数b的梯度\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，权重\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，偏置\n",
    "    \n",
    "    learning_rate, float，学习率\n",
    "    \n",
    "    '''\n",
    "    # 更新W\n",
    "    W -= learning_rate * dW\n",
    "    \n",
    "    # 更新b\n",
    "    b -= learning_rate * db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "print(Wt.mean()) # 0.00405243937693\n",
    "print(bt.mean()) # 0.0\n",
    "\n",
    "Zt = forward(trainX, Wt, bt)\n",
    "dWt, dbt = compute_gradient(trainX, Zt, trainY)\n",
    "update(dWt, dbt, Wt, bt, 0.01)\n",
    "\n",
    "print(Wt.shape) # (3,)\n",
    "print(Wt.mean()) # 15320302.4166\n",
    "print(bt.mean()) # 3643.08555764"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "完成整个参数更新的过程，先计算梯度，再更新参数，将compute_gradient和update组装在一起。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def backward(X, Z, y_true, W, b, learning_rate):\n",
    "    '''\n",
    "    使用compute_gradient和update函数，先计算梯度，再更新参数\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, shape = (n, m)，输入的数据\n",
    "    \n",
    "    Z: np.ndarray, shape = (n, )，线性组合后的值\n",
    "    \n",
    "    y_true: np.ndarray, shape = (n, )，真值\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，权重\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，偏置\n",
    "    \n",
    "    learning_rate, float，学习率\n",
    "    \n",
    "    '''\n",
    "    # 计算参数的梯度\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    # 更新参数\n",
    "    # YOUR CODE HERE\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "print(Wt.mean()) # 0.00405243937693\n",
    "print(bt.mean()) # 0.0\n",
    "\n",
    "Zt = forward(trainX, Wt, bt)\n",
    "backward(trainX, Zt, trainY, Wt, bt, 0.01)\n",
    "\n",
    "print(Wt.shape) # (3,)\n",
    "print(Wt.mean()) # 15320302.4166\n",
    "print(bt.mean()) # 3643.08555764"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. 训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train(trainX, trainY, testX, testY, W, b, epochs, learning_rate = 0.01, verbose = False):\n",
    "    '''\n",
    "    训练，我们要迭代epochs次，每次迭代的过程中，做一次前向传播和一次反向传播，更新参数\n",
    "    同时记录训练集和测试集上的损失值，后面画图用。然后循环往复，直到达到最大迭代次数epochs\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    trainX: np.ndarray, shape = (n, m), 训练集\n",
    "    \n",
    "    trainY: np.ndarray, shape = (n, ), 训练集标记\n",
    "    \n",
    "    testX: np.ndarray, shape = (n_test, m)，测试集\n",
    "    \n",
    "    testY: np.ndarray, shape = (n_test, )，测试集的标记\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，参数b\n",
    "    \n",
    "    epochs: int, 要迭代的轮数\n",
    "    \n",
    "    learning_rate: float, default 0.01，学习率\n",
    "    \n",
    "    verbose: boolean, default False，是否打印损失值\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    training_loss_list: list(float)，每迭代一次之后，训练集上的损失值\n",
    "    \n",
    "    testing_loss_list: list(float)，每迭代一次之后，测试集上的损失值\n",
    "    \n",
    "    '''\n",
    "    training_loss_list = []\n",
    "    testing_loss_list = []\n",
    "    \n",
    "    for epoch in range(epochs):\n",
    "        \n",
    "        # 这里我们要将神经网络的输出值保存起来，因为后面反向传播的时候需要这个值\n",
    "        Z = forward(trainX, W, b)\n",
    "        \n",
    "        # 计算训练集的损失值\n",
    "        training_loss = mse(trainY, Z)\n",
    "        \n",
    "        # 计算测试集的损失值        \n",
    "        testing_loss = mse(testY, forward(testX, W, b))\n",
    "        \n",
    "        # 将损失值存起来\n",
    "        training_loss_list.append(training_loss)\n",
    "        testing_loss_list.append(testing_loss)\n",
    "        \n",
    "        # 打印损失值，debug用\n",
    "        if verbose:\n",
    "            print('epoch %s training loss: %s'%(epoch+1, training_loss))\n",
    "            print('epoch %s testing loss: %s'%(epoch+1, testing_loss))\n",
    "            print()\n",
    "        \n",
    "        # 反向传播，参数更新\n",
    "        backward(trainX, Z, trainY, W, b, learning_rate)\n",
    "        \n",
    "    return training_loss_list, testing_loss_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "print(Wt.mean())          # 0.00405243937693\n",
    "print(bt.mean())          # 0.0\n",
    "\n",
    "training_loss_list, testing_loss_list = train(trainX, trainY, testX, testY, Wt, bt, 2, learning_rate = 0.01, verbose = False)\n",
    "\n",
    "print(training_loss_list) # [39381033680.460075, 3.3902307664083424e+23]\n",
    "print(testing_loss_list)  # [38555252685.093872, 4.1516070070405267e+23]\n",
    "print(Wt.mean())          # -5.70557904608e+13\n",
    "print(bt.mean())          # -8824267814.59"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. 检查\n",
    "\n",
    "编写一个绘制损失值变化曲线的函数\n",
    "\n",
    "一般我们通过绘制损失函数的变化曲线来判断模型的拟合状态。\n",
    "\n",
    "一般来说，随着迭代轮数的增加，训练集的loss在下降，而测试集的loss在上升，这说明我们正在不断地让模型在训练集上表现得越来越好，在测试集上表现得越来越糟糕，这就是过拟合的体现。  \n",
    "\n",
    "如果训练集loss和测试集loss共同下降，这就是我们想要的结果，说明模型正在很好的学习。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_loss_curve(training_loss_list, testing_loss_list):\n",
    "    '''\n",
    "    绘制损失值变化曲线\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    training_loss_list: list(float)，每迭代一次之后，训练集上的损失值\n",
    "    \n",
    "    testing_loss_list: list(float)，每迭代一次之后，测试集上的损失值\n",
    "    \n",
    "    '''\n",
    "    plt.figure(figsize = (10, 6))\n",
    "    plt.plot(training_loss_list, label = 'training loss')\n",
    "    plt.plot(testing_loss_list, label = 'testing loss')\n",
    "    plt.xlabel('epoch')\n",
    "    plt.ylabel('loss')\n",
    "    plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面这些函数就是完成整个神经网络需要的函数了\n",
    "\n",
    "|函数名|功能|\n",
    "|-|-|\n",
    "|initialize | 参数初始化|\n",
    "|forward | 给定数据，计算神经网络的输出值|\n",
    "|mse | 给定真值，计算神经网络的预测值与真值之间的差距|\n",
    "|backward | 计算参数的梯度，并实现参数的更新|\n",
    "|compute_gradient | 计算参数的梯度|\n",
    "|update | 参数的更新|\n",
    "|backward | 计算参数梯度，并且更新参数|\n",
    "|train | 训练神经网络|\n",
    "|plot_loss_curve | 绘制损失函数的变化曲线|\n",
    "\n",
    "我们使用参数初始化函数和训练函数，完成神经网络的训练。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 特征数m\n",
    "m = trainX.shape[1]\n",
    "\n",
    "# 参数初始化\n",
    "W, b = initialize(m)\n",
    "\n",
    "# 训练20轮，学习率为0.01\n",
    "training_loss_list, testing_loss_list = train(trainX, trainY, testX, testY, W, b, 20, learning_rate = 0.01, verbose = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "绘制损失值的变化曲线"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_loss_curve(training_loss_list, testing_loss_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过打印损失的信息我们可以看到损失值持续上升，这就说明哪里出了问题。但是如果所有的测试样例都通过了，就说明我们的实现是没有问题的。运行下面的测试样例，观察哪里出了问题。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "print('epoch 0, W:', Wt)  # [-0.00348894  0.00983703  0.00580923]\n",
    "print('epoch 0, b:', bt)  # [ 0.]\n",
    "print()\n",
    "\n",
    "Zt = forward(trainX, Wt, bt)\n",
    "dWt, dbt = compute_gradient(trainX, Zt, trainY)\n",
    "print('dWt:', dWt) # [ -4.18172940e+09  -2.19880296e+08  -1.94481031e+08]\n",
    "print('db:', dbt) # -364308.555764\n",
    "print()\n",
    "\n",
    "update(dWt, dbt, Wt, bt, 0.01)\n",
    "print('epoch 1, W:', Wt)  # [ 41817293.96016914   2198802.97412493   1944810.31544994]\n",
    "print('epoch 1, b:', bt)  # [ 3643.08555764]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，我们最开始的参数都是在 $10^{-3}$ 这个数量级上，而第一轮迭代时计算出的梯度的数量级在 $10^8$ 左右，这就导致使用梯度下降更新的时候，让参数变成了 $10^6$ 这个数量级左右（学习率为0.01）。产生这样的问题的主要原因是：我们的原始数据 $X$ 没有经过适当的处理，直接扔到了神经网络中进行训练，导致在计算梯度时，由于 $X$ 的数量级过大，导致梯度的数量级变大，在参数更新时使得参数的数量级不断上升，导致参数无法收敛。\n",
    "\n",
    "解决的方法也很简单，对参数进行归一化处理，将其标准化，使均值为0，缩放到 $[-1, 1]$附近。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. 标准化处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "标准化处理和第一题一样"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "stand = StandardScaler()\n",
    "trainX_normalized = stand.fit_transform(trainX)\n",
    "testX_normalized = stand.transform(testX)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "重新训练模型，这次我们迭代40轮，学习率设置为0.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = trainX.shape[1]\n",
    "W, b = initialize(m)\n",
    "training_loss_list, testing_loss_list = train(trainX_normalized, trainY, testX_normalized, testY, W, b, 40, learning_rate = 0.1, verbose = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "打印损失值变化曲线"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_loss_curve(training_loss_list, testing_loss_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算测试集上的MSE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction = forward(testX_normalized, W, b)\n",
    "mse(testY, prediction) ** 0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
